A Bias Correction for the Minimum Error Rate in Cross-validation By
نویسندگان
چکیده
Tuning parameters in supervised learning problems are often estimated by cross-validation. The minimum value of the cross-validation error can be biased downward as an estimate of the test error at that same value of the tuning parameter. We propose a simple method for the estimation of this bias that uses information from the cross-validation process. As a result, it requires essentially no additional computation. We apply our bias estimate to a number of popular classifiers in various settings, and examine its performance.
منابع مشابه
A bias correction for the minimum error rate in cross-validation
Tuning parameters in supervised learning problems are often estimated by cross-validation. The minimum value of the cross-validation error can be biased downward as an estimate of the test error at that same value of the tuning parameter. We propose a simple method for the estimation of this bias that uses information from the crossvalidation process. As a result, it requires essentially no add...
متن کاملCorrecting the optimally selected resampling-based error rate: A smooth analytical alternative to nested cross-validation
High-dimensional binary classification tasks, e.g. the classification of mi-croarray samples into normal and cancer tissues, usually involve a tuning parameter adjusting the complexity of the applied method to the examined data set. By reporting the performance of the best tuning parameter value only, over-optimistic prediction errors are published. The contribution of this paper is twofold. Fi...
متن کاملBias correction for selecting the minimal-error classifier from many machine learning models
MOTIVATION Supervised machine learning is commonly applied in genomic research to construct a classifier from the training data that is generalizable to predict independent testing data. When test datasets are not available, cross-validation is commonly used to estimate the error rate. Many machine learning methods are available, and it is well known that no universally best method exists in ge...
متن کاملروشی نوین در کاهش نوفه رایسین از مقدار بزرگی سیگنال دیفیوژن در تصویربرداری تشدید مغناطیسی (MRI)
The true MR signal intensity extracted from noisy MR magnitude images is biased with the Rician noise caused by noise rectification in the magnitude calculation for low intensity pixels. This noise is more problematic when a quantitative analysis is performed based on the magnitude images with low SNR(<3.0). In such cases, the received signal for both the real and imaginary components will fluc...
متن کاملSelection bias in gene extraction on the basis of microarray gene-expression data.
In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only...
متن کامل